BE 



WORIJ> INTELLECnJAL FROPERTT OROANIZATTON 
Intwnarinfnal Bucnu 




PCX 

mr^NATTONAL APPUCATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 
C12Q 1/37, GOIN 33/68, C07K 1/04 



Al 



(11) Intenrntional Publication Number: 

(43) Internatioiml Publication Date: 



WO 98a2876 

30 July 1998 (30^.98) 



(21) Inteniational Application Number: PCT/GB98/0O201 

(22) Inteniational Filing Date: . 23 Januaiy 1998 (23.01.98) 



(30) Priority Data: 
9701357.7 
9726947.6 



23 Januaiy 1997 (23.01.97) GB 
19 December 1997 (19.12.97) GB 



(71) Applicant (for a ll desig nated States except US)i BRAX GE- 

NOMICS UMTTED [GB/GB]; 13 Station Road, Cambiidgp 
CB12JB(GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): SCHMIDT, GQnter 
[DE/GB]; Houghton Manor, Houghton, Cambs PE17 2BQ 
(GB). THOMPSON. Andrew, Hugin [GB/GB]; 25 Knoll 
Parte, AUoway. Ayr KA7 4RH (GB). 

(74) Agents: DANIELS, Jef&ey, Nicholas et al.; Page White & 
Fancr. 54 Doughty Street, London WCIN 2LS (GB). 



(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR, 
BY, CA. CH, CN, CU, CZ. DE, DK, EE. ES. PI, GB, GE. 
GH, HU, BU IS, JP, KE, KG, KP, KR, KZ. LC, LK, LR, 
LS, LT. LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, 
PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ. TM, TO, TT, 
UA, UG. US, UZ, VN, YU, ZW. ARIPO patiait (GH, GM. 
KE, LS. MW, SD, SZ, UG. ZW), Eurasian patent (AM,' AZ, 
BY, KG, KZ, MD. RU, TJ, TM), European patent (AT, BE, 
CH. DE, DK, ES, FI, PR, GB, GR, IB, FT, LU, MC. NL, 
PT. SE), OAPI patent (BP. BJ, CP, CG. CI. CM, GA, GN, 
ML, MR. NE, SN, TD, TG). 



Published 

With international search report 

Before the expiration of the time Umit for amending the 
claims and to be republished in the event of the receipt of 



(54) Title: CHARACIERISING POLYPEPHDES 
(57) Abstract 

A method for characterising polypeptides, which comprises: (a) treating a sample comprising a populad<xi of one or more polypeptides 
with a cleavage agent which is known to recognise in polypeptide chains a specific amino acid residue or sequence and to cleave at a 
cleavage site, whereby the population is cleaved to generate peptide fragments; (b) isolating a population of the peptide fragments which 
bear at one end a reference tenninus comprising either only an C-teiminus or only an N-teiminus and which bear at the other end the 
cleavage site proximal to the reference tenninus; and (c) detennlnmg a signature sequence of at least some of the Isolated framents. which 
signature sequence is the sequence of a predetermined number of amino acid residues running from die cleavage site; wherem the signature 
sequence and the relative position of the cleavage site to the lefeienoe temiinus characteiise the or each polypeptide. 



BEST AVAILABLE COPY 



FOR THE PURPOSES OP INFORMATION ONLY 



Codes used to identify States paity to die PCT on the fiont pages of pamphlets publishing infemati(»ial applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


SlovHua 


AM 


Annenia 


FI 


Fklaml 


LT 


Irittaania 


SK 


Slovdda 


AT 


Austria 


FR 


France 


LU 


Luxerabooig 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TO 


Chad 


BA 


Bosohi and Henesovioa 


GE 


Geoigit 


MD 


Republic of Moldova 


TG 


Togo 


BB 
BE 


Baibados 


GH 


n>m»a 


MG 


Madagascar 


TJ 


Tajikistan 
"nutaneniatan 


Belgiuin 


GN 


Gniiiea 


MK 


HiefbnnerYqgoday 


TM 


BF 


Bnzldna Faso 


GR 


Gfccoe 




Republic of Maoedooia 


TR 


'Airicey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 
Ukiaine 


BJ 


Benin 


IE 


Iceland 


MN 


MaqgoUa 


UA 


BR 


Brazil 


IL 


biael 


MR 


Mauritania 


UG 


Uganda 


BY 


Belanis 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 




IT 


Italy 


MX 


Mexico 


UZ 


Uzbekistan 


CF 


Central African BepoUk 


JP 


Japan 


NB 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Ktayt 


NL 


Netfaerianda 


YU 


Yugoalivia 


CH 


Swimrioul 


KG 


Xyigyzitin' 


NO 


Norway 


zw 


Zimbabwe 


CI 


CfitadTvobe 


KP 


Denoentie Paople'a 


NZ 


New Zealand 




CM 


Canenon 




R^id»lic of Korea 


PL 


Poland 






CN 


QUna 


KR 


Republic of Korea 


PT 


Poctngal 
Romania 






CU 


Qiba 


KZ 


Kazakstan 


RO 






CZ 


Czech Republic 


LC 


Saint Luda 


RU 


Russian Fedentioo 






DB 


Oennany 


U 


Liedtteoitcin 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Uberia 


SG 


Sittgapoie 







wo 98/32876 PCT/GB98/00201 



CHARACTERISING POIaYPEPTIDBS 
Field of the Invention 

The present invention relates to a method for characterising 
polypeptides and to methods for identifying and assaying such 
polypeptides . 

Background to the Invention 

The characterisation and identification of polypeptides from 
complex mixtures thereof, such as protein samples found in 
biological systems, is a well-lcnown problem in biochemistry. 
Traditional methods involve a variety of liquid phase 
fractionation and chromatography steps followed by 
characterisation, for example by two dimensional gel 
electrophoresis. Such methods are prone to artefacts and are 
inherently slow. Moreover, automation of such methods is 
extremely difficult. 

Patent Application PCT/GB97/02403 , filed on 5th September 1997, 
describes a method for profiling a cDNA population in order to 
generate a 'signature' for every cDNA in the population. It is 
assumed in that method that a short sequence of about 8 bp that 
is determined with respect to a fixed reference point is 
sufficient to identify almost all genes. This system relies on 
immobilising the cDNA population at the 3' terminus and cleaving 
it with a restriction endonuclease. This leaves a population of 
3' restriction fragments. The patent describes a technique that 
allows one to determine a signature of roughly 8 to 10 base pairs 
at a specified number of bases from the restriction site which 
is a sufficient signature to identify nearly all genes. 

Techniques for profiling proteins, that is to say cataloguing the 
identities and quantities of proteins in a tissue, are less well 
developed in terms of automation or high throughput. The 
classical method of profiling a population of proteins is by two- 
dimensional electrophoresis. In this method a protein sample 
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extracted from a biological sample is separated on a narrow gel 
strip- This first separation usually separates proteins on the 
basis of their iso-electric point. The entire gel strip is then 
laid against one edge of a rectangular gel. The separated 
proteins in the strip are then electrophoretically separated in 
the second gel on the basis of their size. This technology is 
slow and very difficult to automate. It is also relatively 
insensitive in its simplest incarnations. A number of 
improvements have been made to increase resolution of proteins 
by 2-D gel electrophoresis and to improve the sensitivity of the 
system. One method to improve, the sensitivity of 2-D gel 
electrophoresis and its resolution is to analyse the protein in 
specific spots on the gel by mass spectrometry. One such method 
is in-gel tryptic digestion followed by analysis of the tryptic 
fragments by mass spectrometry to generate a peptide mass 
fingerprint. If sequence information is required, tandem mass 
spectrometry analysis can be performed. 

More recently attempts have been made to exploit mass 
spectrometry to analyse whole proteins that have been 
fractionated by liquid chromatography or capillary 
electrophoresis. In-line systems exploiting capillary 
electrophoresis mass spectrometry have been tested. The analysis 
of whole proteins by mass spectrometry, however, suffers from a 
number of difficulties. The first difficulty is the analysis of 
the complex mass spectra resulting from multiple ionisation 
states, accessible by individual proteins. The second major 
disadvantage is that the mass resolution of mass spectrometers 
is at present quite poor for high molecular weight species, i.e. 
for ions that are greater than about 4 kilodaltons in mass so 
resolving proteins that are close in mass is difficult. A third 
disadvantage is that further analysis of whole proteins by tandem 
mass spectrometry is difficult as the fragmentation patterns for 
whole proteins are extremely complex. 



Suasnary of the Invention 

The present invention provides a method for characterising 
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polypeptides, which comprises: x / 

(a) treating a samp le c^ffrprising a population of one or more 
polypeptides with a<;; creavage agent >which is known to recognise 
in polypeptide chains a specific amino acid residue or sequence 
and to cleave at a cleavage site, whereby the population is 
cleaved to generate peptide fragments; 

(b) isolating a population of the peptide fragments which bear 
at one end a reference terminus comprising either only a C- 
terminus or only an N-terminus and which bear at the other end 
the cleavage site proximal to the reference terminus; and 

(c) determining a signature sequence of at least some of the 
isolated fragments, which signature sequence is the sequence of 
a predetermined number of amino acid residues running from the 
cleavage site; 

wherein the signature sequence and the relative position of the 
cleavage site to the reference terminus characterise the or each 
polypeptide. 

The invention therefore describes a system analogous to that of 
PCT/GB97/02403, but for use with proteins. Since there are 20 
monomers that make up a protein there are a great many more 
possible variants at a particular site in a sequence and so the 
length of signature required from a protein sequence is much 
shorter than that required from a cDNA sequence to identify it 
uniquely. 

This invention can use liquid phase separation techniques and 
mass spectrometry to resolve proteins and protein fragments to 
facilitate automation and avoid the artefacts and inherent 
slowness and lack of automation in gel based techniques such as 
2-D gel electrophoresis. 



The reference terminus may be attached to a solid phase support 
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to immobilise the population of polypeptides or peptide fragments 
thereof* Preferably, the population of polypeptides is 
immobilised before treatment with the cleavage agent. In this 
way, the peptide fragments produced on treatment with the 
cleavage agent remain immobilised and can be readily isolated by 
washing away unwanted material present in the liquid phase. The 
solid phase support may comprise suitable beads or other such 
supports well known in this art. Such supports or substrates may 
be chosen to bind selectively to either the N-terminus or the C- 
terminus and this is discussed in further detail below. 

In one embodiment, the reference terminus is attached to the 
solid phase support by: (i) treating the polypeptides with a 
blocking agent to block all exposed reference groups, which 
comprise either carboxyl groups or primary amine groups; 
(ii) cleaving the reference terminal amino acids to escpose 
unblocked reference termini; and (iii) treating the unblocked 
reference termini with an immobilisation agent capable of 
coupling to the solid phase support; wherein step (b) comprises 
binding the treated refrence termini to the solid phase support 
and removing unbound peptide fragments. In an alternative 
embodiment, the method further comprises (i) preparing the sample 
step (a) by pre- treating the polypeptides with a blocking agent 
to block all exposed reference groups, which comprise either 
carboxyl groups or primary amine groups, so that subsequent 
treatment of the sample with the cleavage agent generates peptide 
fragments bearing unblocked reference termini; (ii) biotinylating 
the lanblocked reference termini; and (iii) binding the peptide 
fragments containing the unblocked reference termini to a solid 
phase support; wherein step (b) comprises eluting unbound peptide 
fragments therefrom. Preferably, the immobilisation agent 
comprises a biotinylation agent. 

The cleavage agent must recognise a specific amino acid residue 
or sequence of amino acids reliably. The cleavage site may be 
at the specific amino acid residue or sequence or at a known 
displacement therefrom. The cleavage agent may be a chemical 
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cleavage agent such as cyanogen bromide. Preferably, the 
cleavage agent is a peptidase, such as a serine protease, 
preferably trypsin. . 

As discussed in further detail below, depending on the number of 
proteins or polypeptides in a given sample, it may be 
advantageous to sort the polypeptides into manageable svh- 
populations. Sorting can be effected before treatment of the 
sample with the cleavage agent or after cleavage. As discussed 
in further detail below, the san5>le of step (a) may coirprise a 
sub-Qellular fraction. In this way, the method further comprises 
a step of sub-cellular fractionation before step (a) . The sample 
of step (a) may be prepared by liquid chromatography of either 
a crude fraction or a sub-cellular fraction. A preferred method 
of determining the signature sequence is by mass spectrometry and 
this may be preceded by a high pressure liquid chromatography 
step to resolve the peptide fragments. Alternatively, the 
peptide fragments may be subjected to ion exchange chromatography 
before step (c) , followed by sequencing by either mass 
spectrometry or other methods. 

In accordance with the method of the present invention, the 
predetermined number of amino acid residues required to 
constitute the signature sequence will vary according to the size 
of the polypeptide population. Preferably, the predetermined 
number of amino acid residues is from 3 to 30, more preferably 
3 to 6. 

The present invention further provides a method for identifying 
polypeptides in a test sample. The method comprises 
characterising the polypeptides as described above and comparing 
the signature sequences and relative positions of the cleavage 
site obtained thereby with the signature sequences and relative 
positions of the cleavage site of known polypeptides in order to 
identify the or each polypeptide in the test sample. This method 
can be used to identify a single unknown polypeptide or a 
population of unknown polypeptides by comparing their 
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characteristics, (i.e. their signature sequences, and relative 
positions of cleavage site) with those of previously identified 
polypeptides. It is envisaged that the database of such 
characteristic can readily be compiled. 

In a further aspect, the present invention provides a method for 
assaying for one or more specific polypeptides in a test sample. 
The method comprises performing a method as described above, 
wherein the cleavage agent and relative position of the cleavage 
site is predeteirmined and the signature sequence is determined 
in step (c) by assaying for a predetermined sequence of amino 
acid residues running from the cleavage site. Preferably, the 
cleavage site and signature sequence are predetermined by 
selecting corresponding sequences from one or more known target 
polypeptides, such as those available from the database. 

Brief Description of the Drawings 

The invention will now be described in further detail, by way of 
example only, with reference to the accon^anying drawings, in 
which: 

FIGURE 1 shows a reaction scheme according to one embodiment of 
the invention; 

FIGURE 2 shows a reaction scheme according to another embodiment 
of the invention; 

FIGURE 3 shows a reaction scheme according to a simple embodiment 
of the invention; and 

FIGURE 4 shows a reaction scheme according to a variation of the 
embodiment shown in Figure 1. 

Brief Description of the Invention 
Protein Signatures: 

The essence of this system is that one can immobilise a 
population of proteins onto a solid phase substrate at one 
terminus of the molecule. Proteins are directional so a 
particular terminus can be chosen in a manner dependent on the 
chemistry of the immobilisation agent, for example the Edman 
reagent (phenyl isothiocyanate) can be used selectively to remove 
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amino acids from the N- terminus of a protein; however, if phenyl 
isocyanate is used the N- terminus is simply capped. A derivative 
of this molecule that could be coupled to a cleavable linker on 
a solid-phase substrate would allow a protein to be immobilised 
at its N-terrainus and subsequently removed by cleavage of the 
linker. During peptide synthesis, the C- terminus is usually 
immobilised as a benzyl ester, through the use of a chloromethyl 
group. Such chemistry may be adapted to immobilise proteins at 
this terminus, if desired. 

A population of immobilised proteins is then treated with a 
sequence specific peptidase such as trypsin to leave a population 
of N- terminal cleavage fragments. Such fragments can be 
considered to be analogous to an expressed sequence tag for a 
protein. One can then sequence the resultant peptide signatures 
by mass spectrometry. Terminal fragments are most meaningful, 
in that the position of all resultant peptide in the protein is 
known and the termini are usually accessible at the surface of 
most proteins. 

Sorting Proteins: 

Obviously a population of proteins extracted from a cell is going 
to be a significant number of distinct species. If, as it is 
thought there are roughly 15000 genes expressed in the average 
human cell, one can expect as many proteins. , Clearly one cannot 
sequence all of these by mass* spectrometry in a single step, with 
present technology. For this reason a protein population of such 
size needs to be sorted into manageable sxib-sets. 

A generalised system for profiling proteins must attempt to 
resolve a protein population into reasonably discrete subsets of 
relatively \iniform size. This is most readily achieved by 
separation on the basis of global properties of proteins, that 
vary over a broad and continuous range, such as size and surface 
charge, which are the properties used most effectively in 2-D gel 
electrophoresis. Such separations can be achieved as rapidly or 
more so using liquid chromatographic technicjues. In fact, by 
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following one liquid chromatography separation by another , one 
can resolve proteins in as many dimensions as one requires, since 
there is a great deal more flexibility in liquid chromatography 
separation systems, although one would ideally avoid too many 
separation steps to prevent sample loss. 

Sorting can be effected during extraction, after extraction of 
proteins from their source tissue or after cleavage of 
immobilised peptides. 

Sorting during cell fractionation: 

Proteins are intrinsically sorted in vivo, in terms of their 
compartmentalisation within a cell. Various techniques are 
available that allow one to sort proteins on the basis of their 
cellular compartments. Fractionation protocols involve various 
cell lysis techniques such as sonication, detergents or 
mechanical cell lysis that can be coupled to a variety of 
fractionation techniques, mainly centrifugation. Separation into 
membrane proteins, cytosolic proteins and the major membrane 
bound sub-cellular compartments, such as the nucleus and 
mitochondria, is standard practice. Thus one can effectively 
ignore certain classes of protein if one chooses, e.g. 
mitochondrial proteins are likely to be iminteresting in a lot 
of cases. Membrane,' cytosolic and nuclear compartments will be 
of particular interest on the whole. 

SQJTtinq aft^r ext^^qtjon: 

Since proteins are highly heterogenous molecules numerous 
techniques for separation of proteins are available on the basis 
of size, hydrophobicity, surface charge and various combinations' 
of the above using liquid chromatography in its various 
incarnations. Separation is effected by an assortment of - solid 
phase matrices derivitised with various fxinctionalities that 
adhere to and hence slow down the flow of proteins through the 
column on the basis of the properties above. Molecules are 
normally loaded into such columns in conditions favouring 
adhesion to the solid phase matrix and selectively washed off in 
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steadily increasing quantities of a second buffer favouring 
elution. In this way the proteins with the weakest interactions 
with a given matrix elute first. 

Various formats for liquid chromatography exist but for greatest 
speed of throughput and for the most discrete separations High 
Pressure Liquid Chromatography (HPLC) formats are favoured. In 
this format the matrix is designed to be highly incompressible 
and when derivitised allow chromatographic separation to be 
performed at extremely high pressures which favours rapid and 
discrete separation. 

Sorting of cleaved oeptideg* 

Liquid chromatography mass spectrometry (LCMS) is a well 
developed field. HPLC systems directly coupled to electrospray 
mass spectrometers are in widespread use. HPLC is a fast and 
effective way of resolving peptides after they have been cleaved 
from their immobilised state. 

Alternatively sorting peptides by ion exchange chromatography 
might be advantageous, in that short peptides could be separated 
in an almost sequence dependent manner: the amino acids that are 
ionisable have known pKa values and hence elution of peptides 
from such a column at a specific pH, would be indicative of the 
presence of particular amino acids in that sequence. For 
example, aspartate residues have a pKa of 3.9 and glutamate 
residues 4.3. Elution of a peptide at pH 4.3 would be indicative 
of the presence of glutamate in the peptide. These effects are 
sometimes masked in large proteins but should be distinct in 
short peptides, hence would be extremely useful as sorting 
features . 

Combination of the above techniques will allow various sorting 
protocols to be developed that will allow great control over the 
form of the protein profile generated. In this way, 
identification of most proteins expressed in a cell should be 
achievable. 
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Sequencing o£ peptides by mass spectrometry: 

Peptides can be readily sequenced directly by tandem mass 
spectrometry. In general, peptide mixtures are injected into the 
mass spectrometer by electrospray, which leaves them in the 
vapour phase. The first mass spectrometer acts as a filter 
selecting molecules to enter the second mass spectrometer on the 
basis of their mass charge ratio, such that essentially only a 
single species enters the second mass spectrometer at a time. 
On leaving the first mass spectrometer, the selected peptide 
passes through a collision chamber, which results in 
fragmentation of the peptide. Since fragmentation occurs mostly 
at the peptide bond, the pattern of fragments corresponds to a 
series of subspecies of peptides and amino acids that compose the 
original peptide. The distinct pattern of masses of single amino 
acids, 2-mers, 3-mers, etc. generated in the fragmentation of the 
peptide is sufficient to identify its sequence. 

The end result is then that a population of proteins can be 
arbitrarily sorted into populations of peptides of convenient 
size to be fed into an electrospray tandem mass spectrometer for 
direct sequencing. Completion of such an analysis for an entire 
cell's proteins would give a profile of what proteins are present 
and in what relative quantities. Absolute quantitation could be 
achieved by 'spikirtg' a protein population with known quantities 
of particular proteins, known to be eibsent, e.g. plsuit proteins 
in animal samples or visa versa against which to calibrate 
results. 

Protein Signatures: 

This invention provides a method of capturing a population of 
proteins onto a solid phase substrate by one terminus of each 
protein in the population. This invention also provides a method 
of cleaving proteins that have been derivatised at one terminus 
with an agent that can be used to immobilise that terminus on a 
solid phase substrate. This allows a single peptide for each 
protein in a population to be captured onto a solid phase 
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substrate thus peptides from the chosen terminus can be separated 
from other peptides generated by the cleavage step and can be 
isolated. This invention also provides a method to allow all the 
peptides generated in a cleavage step that are not from the 
reference terminus to be captured leaving a single terminal 
peptide per protein free in solution for analysis. 

A population of peptides generated according to the methods of 
this invention can be analysed, in a number of ways preferably by 
mass spectrometry. 

Two forms of analysis are preferred. The first is to determine 
peptide mass fingerprints for the population of signature 
peptides generated. In this method the mass of each peptide, 
preferably the accizrate mass, is determined. A significant 
proportion of signature peptides should be uniquely identified 
by this form of analysis. Any mass peaks that are xinknown can be 
further characterised by the second form of preferred analysis. 
Ions of a specific mass can be selected for collision induced 
dissociation in a tandem mass spectrometer. This technique can 
be used to determine sequence information for a peptide. 

Capturing Peptides: 

This invention provides methods that exploit derivitisation of 
proteins with various agents, including existing peptide 
sequencing reagents, to isolate a single ^signature' peptide from 
each member of a population of proteins . This invention may be 
practised in two formats. The methods of this invention allow a 
reference terminus to be selected from the proteins in a 
population. In the first format this reference terminus may be 
derivatised with an immobilisation agent. If the proteins 
derivatised in this manner are treated with a sequence specific 
cleavage agent to generate peptides, the peptides from the 
reference termini of the proteins in a mixture can be 
specifically captured leaving the remaining peptides free in 
solution. This first format is discussed in the following section 
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headed "Format 1" In the second format a single peptide sample 
per protein is generated by capturing the peptide fragments that 
are not from the chosen reference terminus, thus leaving the 
signature peptides free in solution. 

Format 1: 

In the simplest embodiment of this invention as shown 
schematically in Figure 3, a population of proteins is reacted 
with a modified sequencing agent specific for one terminus of 
each -protein in the population. The modified sequencing agent 
carries an immobilisation agent in order that proteins 
derivatised with the sequencing agent may be captured onto a 
solid phase support. The captured proteins may then be cleaved 
with a sequence specific cleavage agent. This cleavage step will 
generate a series of peptide fragments in solution and will leave 
a single peptide protein captured on the solid phase support . The 
peptides free in solution are then washed away. The immobilised 
peptides can then be released from the solid phase support by 
completing the sequencing reaction for the coupled terminal amino 
acid- The Edman reagent (phenyl isothiocyanate) could be modified 
to carry an immobilisation agent, the phenyl ring could be 
substituted with a group linked to. an appropriate immobilisation 
effector such as biotin. A population of proteins derivatised 
with this reagent could be cleaved with trypsin. The derivatised 
terminal peptides could then be immobilised on an avidinated 
solid phase support allowing underivatised peptides to be washed 
away. The peptides could then be released from the solid phase 
support by disrupting the avidin-biotin reaction. This will 
leave N-terminal peptides free in solution. These peptides can 
then be analysed by mass spectrometry. It may be desirable to 
fractionate the peptides prior to mass spectrometry but this 
fractionation step is optional. Alternatively a modified C- 
terminal sequencing agent might be used to capture proteins by 
the C-terminus. The C-terminus is generally not post- 
translationally modified and so may be the preferred terminus to 
capture a population of proteins. Further embodiments of this 
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invention are discussed below. 
C- terminal Beauencina aaents: 

Unmodified C-terminal sequencing agents can be used to generate 
a signature peptide- A further embodiment of the present 
invention is as follows and is described schematically in Figure 
1, In the first step a protein population extracted from a tissue 
is loosely immobilised onto a membrane, such as a PVDF membrane. 
The solvents used to extract proteins from a tissue sample are 
generally very harsh, usually containing agents such as urea, 
thiourea and detergents, since proteins have widely varying 
solubilities. Immobilising extracted proteins onto a membrane 
allows them to be washed with other solvents prior to 
modification. The protein population, thus captured^ is then 
derivatised with a coupling agent, such as diphenyl 
phosphoroisothiocyanatidate from Hewlett-Packard, (Miller et al . , 
Techniques in Protein Chemistry VI 219 - 227) in a method that 
is essentially the same as that which one would use for a normal 
sequencing reaction for a single protein giving 
peptidylacylisothiocyanates for all proteins. The coupling 
reagent also reacts with other free carboxyl groups also giving 
acylisothiocyanate derivatives. The coupling agent may, however, 
react incompletely with some carboxylic acid side chains. It may, 
therefore, be desirable to perform additional derivitisation 
steps using more reactive reagents to ensure that all free 
carboxyl groups are derivatised. This variation is shown in 
Figure 4. The derivatised protein population is then treated 
with pyridine to effecting ring closure of the terminal 
acylisothiocyanate derivative. One can then cleave the C-terminal 
residue by addition of a cleavage agent such as 
trimethylsilanolate, from Hewlett-Packard, which cleaves the 
terminal amino acid from each protein releasing the 
thiohydantoin- amino acid derivative of the terminal amino acid. 
This exposes a free carboxyl at the penultimate residue of each 
protein. This can be specifically derivatised with biotin using 
5 - (biotimamido)pentylamine since all other carboxyl groups are 
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derivatised. In this way all the proteins in a population can be 
derivatised at the C-.terminal with biotin. The biotinylated 
population, still on the PVDF membrane is then treated with an 
appropriate sequence specific cleavage agent. Trypsin is 
generally used for mass spectrometry applications as this 
generally leaves the N- terminal side of the cleavage site 
protonated which is desirable. Trypsin specifically cleaves 
adjacent to basic residues. If an enzyme is used the immobilised 
peptides would have to be washed with some form of physiological 
buffer to allow trypsin to fiinction. This will leave a population 
of cleaved peptides, some of which are biotinylated which can be 
desorbed from the PVDF membrane into solution. The biotinylated 
peptides can be captured using a solid phase matrix derivatised 
with monomeric avidin. Non- immobilised peptides can then be 
washed away, leaving an immobilised population of C- terminal 
peptides which comprise the tag used to identify proteins in a 
population* After washing away free peptides, the immobilised 
tags can be released from the solid phase support by addition of 
acid which disrupts the biotin/avidin interaction - monomeric 
avidin is best for this purpose. In an alternative embodiment the 
biotinylated peptides can be captured on an avidinated support 
prior to sequence specific cleavage. 

jy- terminal seauencina agents: 

N-termini of a large proportion of cellular proteins are blocked. 
For the purposes of profiling those proteins whose N-termini are 
not blocked one can use the corresponding N- terminal sequencing 
agents to derivatise amino groups including the terminal amino 
group. The terminal amino acid can be cleaved and the newly 
exposed amine at the penultimate amino acid can be derivatised 
with an immobilisation agent. The biotinylated proteins can then 
be cleaved and the terminal signature peptides can be captured 
and analysed. This would however be limited to . those N-termini 
that are not already blocked. 



Format 2; 
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This method is shown schematically in Figure 2. In this method 
a reagent that derivatises carboxyl residues is used to cap all 
carboxyl residues including the C-terminal carboxyl group in the 
protein population of interest. The protein population is then 
cleaved with trypsin or another sequence specific cleavage 
reagent that cleaves at the peptide bond to generate an amino and 
carboxyl group on the C-terminal and N-terminal fragments 
respectively. At this stage all peptides except the terminal 
peptides, which are capped will have a free carboxyl. These free 
carboxyls can be derivatised with 5 - (biotimamido)pentylamine 
or some other immobilisation agent. If biotin is used then one 
can capture all the biotinylated non C-terminal peptides onto a 
solid phase matrix derivatised with avidin. An avidinated 
affinity column in-line with a mass spectrometer would allow C- 
terminal peptides to be selectively eluted directly into the mass 
spectrometer for analysis. 

This technique is equally applicable to generating peptide tags 
from the N-terminus of a population of proteins. Reagents which 
derivatise amine groups can be used to selectively cap all amine 
groups on a protein including the N- terminal amine group. 
Cleavage will e:q>ose amines in non-terminal peptides which can 
be derivatised with biotin allowing selective capture of non N- 
terminal peptides. This is important since many proteins are 
modified at the N-terminus and the N-terminal amine is often 
inaccessible to reagents. Thus selectively capturing non N- 
terminal peptides is a means of generating a signature at the N- 
ter minus . 

The reagents to derivatise amines and carboxyls are also simpler 
than those necessary for the coupling agents used in sequencing 
reactions. 

Iimaobllisation agents: 

It is possible to capture derivatised peptides with a variety of 
chemical agents. In the discussion .of the methods of this 
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invention biotin has been chosen as an exemplary immobilisation 
agent due to its highly specific interactions with avidin. Other 
immobilisation agents besides biotin are compatible with the 
methods of . this invention. The following are exanples and the 
invention is not limited to these. 

A linker to hexahistidine would allow peptide tags to be captured 
onto a coordinated metal ion derivatised column. Various antibody 
antigen interactions could be used as well where an antibody or 
antigen is tagged onto the penultimate amino acid rather than 
bioti-n. 

Antibodies against derivatives s 

The most common N- terminal modification is acetylation. It should 
be possible to raise an antibody against N-terminally acetylated 
peptides to permit these to be captured using an affinity column 
derivatised with such an antibody. In order to capture 
substantially all proteins one can derivatise the remaining 
proteins in a sample^ that are not already acetylated, with an 
acetylation agent. The derivatised proteins can then be cleaved 
with chymotrypsin or another sequence specific agent (tirypsin 
does not cleave acetylated cleavage sites of proteins) . An anti- 
N-terminal acetylation antibody immobilised on an appropriate 
matrix could be used to generate an affinity column. Such a 
column could be used to capture peptide signatures with 
acetylated N-termini after their source proteins have been 
cleaved. 

To capture C-terminal peptides one could raise an antibody 
against thiohydantoin derivatives of peptides which could be used 
to selectively capture a peptide from a protein that had been 
derivatised with a coupling agent for sequencing prior to 
cleavage with trypsin or another sequence specific cleavage 
agent . 



Derivitisation of proteins; 
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The methods of this invention include derivitisation steps which 
are required to ensure that the reference terminus of each 
protein in a population is specifically derivatised with an 
immobilisation agent in the first format or, in the second 
format, to ensure that the reference terminus is specifically 
blocked from reaction with an immobilisation agent. Additional 
derivitisation steps may also be performed. These may be 
desirable if fractionation of signature peptides is to be 
performed prior to mass spectrometry analysis. There are two 
important factors that should be considered with regard to any 
fractionation steps. These factors are the resolution of the 
fractionation step and the consequent sample loss imposed by the 
fractionation • 

Certain chromatographic techniques are 'sticky' when used for the 
separation of peptides, that is to say a proportion of the sample 
is retained on the separation matrix. It is possible to reduce 
sample loss of this kind by derivitising the groups that are 
involved in adhesion to the separation matrix. That is to say, 
if one is using an ion exchange chromatography separation one can 
derivatise ionic and polar side chains with reagents that 
increase their hydrophobicity thus reducing affinity to the 
matrix. This will, however, reduce the resolution of the 
separation. 

It is desirable to ensure that only one mass peak per peptide 
appears in the mass spectrum generated by analysis of a 
population of signature peptides. It may, therefore, be desirable 
to derivatise polar and ionic side chains of signature peptides 
in order to reduce the number of ionisation states accessible to 
those peptides. This step, should help promote the formation of 
a single ion species per signature peptide. 

It may also be desirable to add a group to each signature peptide 
to increase the sensitivity of the mass spectrometry analysis. 
A particularly good * sensitising' group to add to a peptide would 
be a tertiary ammonium ion which is a positively charged entity 
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with excellent detection properties, 
Pre-Sortlng Steps: 

This technology can be used to profile peptide populations 
generated in numerous ways. Various fractionation techniques 
exist to s\ib-sort proteins on the basis of certain features. Of 
particular interest is the analysis of signalling pathways. 
Phosphorylation of proteins by kinases is a feature of many 
signalling pathways. Proteins that can be phosphorylated by a 
kinase often have a short phosphorylation motif that a kinase 
recognises. Antibodies exist that bind to such motifs, some 
binding phosphorylated forms while others bind the non- 
phosphorylated state. Antibody affinity columns or immuno- 
precipitation of kinase target sub-populations followed by 
profiling would be of great interest in identifying these 
proteins and in monitoring their metabolism simultaneously in 
time resolved studies of live model systems. 

Many proteins exists as complexes and analysis of such complexes 
is often tricky. A cloned protein that is a putative member of 
a complex allows one to generate an affinity column with that 
protein to trap other proteins that bind to it. This profiling 
technology is eminently suited to analysis of such captured 
protein complexes. 

Kits including antibody affinity columns to analyse signal 
transduction or membrane location by capturing proteins with the 
appropriate post -trans la tional modifications are envisaged either 
as a pre -sorting step or as a capture step after cleavage of a 
protein population with a sequence specific cleavage agent- 

Chromatographic techniques : 

Having generated peptide tags from a population of proteins it 
is then desirable to analyse the resultant tags. Chromatography 
is an optional step in the analysis of a population of peptide 
signatures prior to mass spectrometry but may be quite desirable 
depending on the configuration of the mass spectrometer used. 
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Two important features are required of any chromatographic stage 
in a protein profiling method, high resolution and minimal sample 
loss. Resolution generates information and also reduces the 
complexity of the peptide tag population entering the mass 
spectrometer. The second feature is that there is minimal loss 
of sample in the chromatographic separation, that would reduce 
the sensitivity of the technique to low frequency peptides in the 
population under analysis. 

Derivitisation of vroteins: 

Certain chromatographic techniques are 'sticky' when used for the 
separation of peptides, that is to say a proportion of the sample 
is retained on the separation matrix. To reduce sample loss of 
this kind is possible by derivitising the groups that are 
involved in adhesion to the separation matrix. That is to say, 
if one is using an ion exchange chromatography separation one can 
derivitise ionic and polar side chains with reagents that 
increase their hydrophobicity thus reducing affinity to the 
matrix • This feature needs to be balanced against the need for 
resolution though. 

The use of the C- terminal sequencing agents to derivitise the 
free carboxyl groups which will reduce the adhesion between such 
peptides and a cation exchange resin. This may mean that cation 
exchange chromatography may be advantageous as a chromatographic 
separation step. 

One can derivitise quite readily acetylate amine residues to 
achieve similar effects for anion exchauige chromatography. 

Analysis of peptides by mass spectrometry: 

lonisation Techniques: 

In general peptide mixtures are injected into the mass 
spectrometer by elect rospray or MALDI TOP, which leaves them in 
the- vapour phase. 

Electrosprav lonisatiom 
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Electrospray ionisation requires that the dilute solution of 
biomolecule be 'atomised' into the spectrometer from an insertion 
probe, i.e. in a fine spray. The solution is, for example, 
sprayed from the tip of a needle in an electrostatic field 
gradient . The mechanism of ionisation is not fully understood but 
is thought to work broadly as follows. The electrostatic field 
charges droplets formed at the probe tip promoting atomisation. 
In the stream of nitrogen the solvent is evaporated. With a small 
droplet, this results in concentration of the biomolecule. Given 
that most biomolecules have a net charge this increases the 
electrostatic repulsion of the dissolved protein. As evaporation 
continues this repulsion ultimately becomes greater than the 
surface tension of the droplet and the droplet 'explodes' into 
smaller droplets. The electrostatic field helps to further 
overcome the surface tension of the charged droplets . The 
evaporation continues from the smaller droplets which, in turn, 
explode iteratively until essentially the biomolecules are in the 
vapour phase, as is all the solvent. 

Atmospheric Pressure Che mical Ionisation: 

An ionisation technique appropriate for use with LCMS, for 
analysing peptides is Atmospheric Pressure Chemical Ionisation 
(APCI) . This is an electrospray based technique where the 
ionisation chamber is modified to include a discharge electrode 
which can be used to ionise the bath gas which in turn will 
collide with the vaporised sample molecules increasing ionisation 
of the sample. 

Fast Atom Bombardment: 

This is an ionisation technique that is quite similar to APCI and 
is highly compatible with samples in solution. Typically, a 
continuous flow of liquid from a capillary electrophoresis column 
or an HPLC column cam be pumped through an insertion probe to a 
hole or a frit at its tip where the solution is bombarded by 
accelerated atoms or ions, usually of xenon or caesium. Collision 
with the dissolved sample results in transfer of kinetic energy 
to and ionisation of the sample. 
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Matrix Aaaiated Laser D^sorotion lonisation (MALDT) , 
MALDI requires that the biomolecule solution be embedded in a 
large molar excess of an photo-excitable 'matrix'. The 
application of laser light of the appropriate frequency (266 nm 
beam for nicotinic acid ) results in the excitation of the matrix 
which in turn leads to excitation and ionisation of the embedded 
biomolecule. This technique imparts a significant quantity of 
translational energy to ions, but tends not to induce excessive 
fragmentation despite this. Accelerating voltages can again be 
used to control fragmentation with this technique though. 

MALDI techniques can be supported in two ways . One can embed mass 
labelled DNA in a MALDI matrix, where the labels themselves are 
not specifically excitable by laser or one can construct labels 
that contain the necessary groups to allow laser energisation. 
The latter approach means the labels do not need to be embedded 
in a matrix before performing mass spectrometry. Such groups 
include nicotinic, sinapinic or cinnamic acid moieties . MALDI 
based cleavage of labels would probably be most effective with 
a photocleavable linker as this would avoid a cleavage step prior 
to performing MALDI mass spectrometry. The various excitable 
ionisation agents have different excitation frequencies so that 
a different frequency can be chosen to trigger ionisation from 
that used to cleave the photolysable linker. These excitable 
moieties are easily derivitised using standard synthetic 
techniques in organic chemistry so labels with multiple .masses 
can be constructed in a combinatorial manner. 

All of the above techniques are routinely used with peptides and 
proteins and are preferred methods of ionisation with this 
invention. 

Mass Spectrometric Sensitivity aziji Quantitation of peptide tags: 
The end result is then that a population of proteins can be 
arbtitrarily sorted into populations of peptides of convenient 
size to be fed into a mass spectrometer for analysis. Completion 
of such an analysis for an entire cell's proteins would give a 
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profile of what proteins are present and in what relative 
quantities. Absolute quantitation could be acheived by 'spiking' 
a protein population with known quantities of particular 
proteins, known to be absent, e.g. plant proteins in animal 
samples or visa versa, against which to calibrate results. 
Internal queuitities can be determined by measuring relative 
quantities of certain proteins present at relatively fixed 
concentrations in most cells such as histones. Various techniques 
coupled to certain mass spectrometer geometries permit good 
quantitation with a mass spectrometer. These issues are dealt 
with .fully in GB 9719284.3. 

Mass Analyser Geometries: 

Mass spectrometry is a highly diverse discipline and numerous 
mass analyser configurations exist and which can often be 
combined in a variety of geometries to permit analysis of complex 
organic molecules such as the peptide tags generated with this 
invention. 

Accurate Mass Measurement: 

Double focussing mass spectrometers are capable of measuring 
molecular masses to a very high accuracy, i.e. fractions of a 
dalton. This permits one to distinguish molecules with identical 
integer mass but different atomic cott5)ositions with ease as 
fractional differences in the mass of different atomic isotopes 
allow such distinctions. For determining the molecular masses of 
a population of peptide tags, this technique may be very 
effective as it would allow identification of a significant 
proportion of peptides without requiring any sequencing even if 
some do have the same integral mass. The few ambiguous peptides 
that remain could be analysed by tandem mass spectrometry as 
discussed below. 

Secniencina of peptide taas bv Tan dem mass STDectrometry: 
Peptides can be readily sequenced by tandem mass spectrometry. 
Tandem mass spectrometry describes a number of techniques in 
which a ions from a sample are selected by a first mass analyser 
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on the basis of their mass charge ratio for further analysis by 
induced fragmentation of those selected ions. The fragmentation 
products are analysed by a second mass analyser. The first mass 
analyser in a tandem instrument acts as a filter selecting ions 
to enter the second mass analyser on the basis of their mass 
charge ratio, such that essentially a species of only a single 
mass/charge ratio, usually only a single peptide ion, enter the 
second mass analyser at a time. On leaving the first mass 
analyser, the selected peptide passes through a collision 
chamber, which results in fragmentation of the peptide. Since 
fragmentation occurs mostly at the peptide bond, the pattern of 
fragments corresponds to a series of subspecies of peptides and 
amino acids that compose the original peptide. The distinct 
pattern of masses of single amino acids, 2-mers, 3-mers, etc. 
generated in the fragmentation of a peptide is sufficient to 
identify its sequence. 

ION SOURCE -> MSI -> COLLISION CELL -> MS2 -> ION DETECTOR 

Various tandem geometries are possible. Conventional 'sector' 
instruments can be used where the electric sector provide the 
first mass analyser stage, the magnetic sector provides the 
second mass analyser, with a collision cell placed between the 
two sectors. This geometry is not ideal for peptide sequencing. 
Two complete sector mass analysers separated by a collision cell 
could be used for peptide sequencing. A more typical geometry 
used is a triple quadrupole where the first quadrupole filters 
ions for collision. The second quadrupole in a triple quadrupole 
acts as a collision chamber while the final quadrupole analyses 
the fragmentation products. This geometry is quite favorable. 
Another more favorable geometry is a Quadrupole /Orthogonal Time 
of Flight tandem instrument where the high scanning rate of a 
quadrupole is coupled to the greater sensitivity of a TOP mass 
analyser to identify the products of fragmentation. 

Sequencing with Ion Traps^* 

Ion Trap mass spectrometers are a relative of the quadrupole 
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spectrometer. The ion trap generally has a 3 electrode 
construction - a cylindrical electrode with 'cap' electrodes at 
each end forming a cavity. A sinusoidal radio frequency potential 
is applied to the cylindrical electrode while the cap electrodes 
are biased with DC or AC potentials. Ions injected into the 
cavity are constrained to a stable circular trajectory by the 
oscillating electric field of the cylindrical electrode! However, 
for a given amplitude of the oscillating potential, certain ions 
will have an unstable trajectory and will be ejected from the 
trap. A sample of ions injected into the trap can be sequentially 
ejected from the trap according to their mass/charge ratio by- 
altering the oscillating radio frequency potential. The ejected 
ions can then be detected allowing a mass spectrum to be 
produced. 

Ion traps are generally operated with a small quantity of a 'bath 
gas', such as helium, present in the ion trap cavity. This 
increases both the resolution and the sensitivity of the device 
by collision with trapped ions. Collisions both increase 
ionisation when a sample is introduced into the trap and damp the 
amplitude and velocity of ion trajectories keeping them nearer 
the centre of the trap. This means that when the oscillating 
potential is changed, ions whose trajectories become unstable 
gain energy more rapidly, relative to the damped circulating ions 
and exit the trap in a tighter bunch giving a narrower larger 
peaks . 

Ion traps can mimic tandem mass spectrometer geometries, in fact 
they can mimic multiple mass spectrometer geometries allowing 
complex analyses of trapped ions. A single mass species from a 
sample can be retained in a trap, i.e. all other species can be 
ejected and then the retained species can be carefully excited 
by super- imposing a second oscillating, frequency on the first. 
The excited ions will then collide with the bath gas. and will 
fragment if sufficiently excited. The fragments can then be 
analysed further. One can retain a fragment ion for further 
analysis by ejecting other ions and then exciting the fragment 
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ion to fragment. This process can be repeated for as long as 
sufficient san5)le exists to permit further analysis. It should 
be noted that these instruments generally retain a high 
proportion of fragment ions after induced fragmentation. These 
instruments and FTICR mass spectrometers (discussed below) 
represent a form of temporally resolved tandem mass spectrometry 
rather than spatially resolved tandem mass spectrometry which is 
found in linear mass spectrometers. 

For the purposes of protein profiling a peptide population, an 
ion t-rap is quite a good instrument. A sample of peptide tags can 
be injected into the spectrometer. Peptide tags that are expected 
to appear in a profile, such as housekeeping proteins or histone 
peptides from eukaryote cell samples, can be ejected specifically 
and quantified rapidly. The remaining peptides can be scanned. 
Totally new peptides can then be selectively retained from 
subsequent samples of the peptide population and can be induced 
to fragment allowing sequence data for that peptide to be 
acquired. Alternatively an Ion Trap can form the first stage of 
a tandem geometry instrument, 

Fourier Transform Ion Cyclotron Resonance Mass Spectrometry 
(FTICR MS) : 

FTICR mass spectrometry has similar features to ion traps in that 
a sample of ions is retained within a cavity but in FTICR MS the 
ions are trapped in a high vacuum chamber by crossed electric and 
magnetic fields. The electric field is generated by a pair of 
plate electrodes that form two sides of a box. The box is 
contained in the field of a superconducting magnet which in 
conjunction with the two plates, the trapping plates, constrain 
injected ions to a circular trajectory between the trapping 
plates, perpendicular to the applied magnetic field* The ions are 
excited to larger orbits by applying a radiof requency pulse to 
two 'transmitter plates 'which form two further opposing sides of 
the box. The cycloidal motion of the ions generate corresponding 
electric fields in the remaining two opposing sides of the box 
which comprise the 'receiver plates'. The excitation pulses 
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excite ions to larger orbits which decay as the coherent motions 
of the ions is lost through collisions. The corresponding signals 
detected by the receiver plates are converted to a mass spectrum 
by f.ourier transform euialysis. 

For induced fragmentation experiments these instruments can 
perform in a similar manner to an ion trap - all ions except a 
single species of interest can be ejected from the trap. A 
collision gas can be introduced into the trap and fragmentation 
can be induced. The fragment ions can be subsequently analysed. 
Generally fragmentation products and bath gas combine to give 
poor resolution if analysed by FT of signals detected by the 
'receiver plates', however the fragment ions can be ejected from 
the cavity and analysed in a tandem configuration with a 
quadrupole, for example. 

For protein profiling FTICR MS could be used and may be 
advantageous as these instruments have a very high mass 
resolution allowing for accurate mass measurement so that 
peptides with the same integer mass but different atomic 
compositions can be resolved. Furthermore unidentified peptide 
tags can be subsequently analysed by fragmentation. 

Protein iannobilisatlon: 

A great deal of knowledge has been accumulated about specific 
protein chemistries particularly in the area of organic synthesis 
of peptides. 

• R.B. Merrifield, Science 232: 341 -347, 1986. 

• S.B.H. Kent, '^Chemical Synthesis of Peptides and Proteins", 
Axuiu, Rev. Biochem. 1988. 57: 957 - 98.9. 

Linkers : 

An important feature of this invention is cleavable linkers to 
their, relevant biomolecules . Photocleavable linkers are 
particularly desirable as they allow for rapid, reagentless 
cleavage. For references, see: 

• Theodora W, Greene, "Protective Groups in Organic Synthesis", 
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1981 , Wiley- Interscience . 
On photorempvable groups: 

• Patchomik, J. Am, Chem. Soc. 92: 6333 - , 1970. 

• Amit et al, J. Org. Chem. 39: 192 - , 1974. 

Liquid Chromatography: 

• R. Scopes, ^^Protein Purification: Principles and Practice", 
Springer-Verlag, 1982 . 

• M. Deutscher, '^Guide to Protein Purification", Academic Press, 
1990. 

Mass Spectrometry: 

Electrospray mass spectrometry is the preferred technique for 
sequencing peptides since it is a very soft technique and can be 
directly coupled to the liquid phase molecular biology used in 
this invention. For a full discussion of mass spectrometry 
techniques see: 

• K. Biemann, '^Mass Spectrometry of Peptides and Proteins", Annu. 
Rev. Biochem. 1992. 61: 977 - 1010. 

• R.A.W. Johnstone and M.E. Rose, "Mass Spectrometry for chemists 
-and biochemists" 2nd edition, Cambridge University Press, 1996. 

Experiment 

Outline of embodiment of protein profiling 
This comprises a system where 

(i) a protein has its carboxyl groups protected, the last amino 
acid removed leaving just one carboxyl group free at the cleaved 
terminus . 

(ii) This will be reacted with a biotinylation reagent, so that 
the carboxy terminus is labelled with biotin. 

(iii) The protein is fragmented with a protease to leave peptide 
fragments, only the carboxyl one being biotinylated. 
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The biotin is used to attach the C terminal fragment to 
immobilised streptavidin, or preferably monomeric avidin, from 
which it can be released with mild acid and made available for 
MS - MS. 

All reagents are available, and the chemistry is generally 
well-known as follows: 

(i) The technique of carboxy- terminal sequencing of proteins is 
established. We note that the method of Boyd et al,, {Boyd,, VL, 
Bozzini, M, Guga, PJ, DeFranco, RJ, Yuan, P-M, Loudon, GM and 
Nguyen, D; J. Org Chem, 60, 2581, (1995)) blocks the side chain 
carboxyls of aspartate and glutamate residues by amidation during 
removal of the terminal amino acid. 

(ii) Biotinylation of the free carboxyl group at the carboxy 
terminus may be achieved using 5- (biotimamido)pentylamine/ 
l-ethyl-3 - [3 -dimethylaminopropyl] carbodiimide hydrochloride , 
which is marketed by Pierce & Warriner (Lee, KY, Birckbichler, 
PJ and Patterson, MK, Clin Chem, 34, 906 (1988) for such a 
purpose. 

(iii) Protease fragmentation of proteins on a membrane is an 
established technique (Sutton, CW, Peraberton, KS, Cottrell, JS, 
Corbett, JM, Wheeler, CH, Dunn, MJ and Pappin, DJ, 
Electrophoresis, 16, 308, (1995), and Millipore Corporation 
produce Immobilon-CD and other PVDF membranes for that purpose . 
Monomeric avidin is produced by Pierce and Warriner, and allows 
release of biotinylated molecules using 2mM biotin 

in phosphate buffered saline. 

The remaining step in the method is the use of PVDF membranes (as 
used for trypsinisation) in lieu of Zitex membranes for the 
sequencing reaction (1) . 
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Methodology 

Binding o£ lysozyme to PVDF membrane 

0,5ram squared pieces of PVDF (Millipore) were wetted with 
isopropanol and incubated in 20mg/ml lysozyme (Pharmacia) in PBS 
at room temperature for 30 minutes. The membranes were then air 
dried and stored at 4oC until used. 

Modification (carboxyl group protection) of lysozyme boxind to 
PVDF - 

Modification solution was prepared by mixing 62mg of 
2-ethyl-5-phenylisoxazolium-3' sulfonate (Aldrich) with 50ul of 
diisopropylethylamine (Aldrich) in 2mls of CH3CN 
lOOul of modification solution was added to each membrane and 
incubated at room temperature for 4 hours. 

Following incLibation 900ul of water was added and each membrane 
was gently shaken at room temperature for 3 0 minutes. Each 
membrane was then transferred to 50ul of CH3CN, 450ul of water 
was added and the membranes . were gently shaken at room 
temperature for 30 minutes. 

The each membrane was then transferred to 500ul of 2% 
trif luoroacetic acid and incubated at room temperature overnight 

Trypsin digest 

Each membrane was transferred to 250ul of 25mM ammonium 
bicarbonate pH7.6 solution and gently shaken at room temperature 
for 15 minutes. 

Each protein/protein containing membrane was added/ transferred 
to 200ul of ammonium bicarbonate solution pH7.6 containing 5ug 
of trypsin and incubated at 37®C overnight. 
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Bluatlon o£ protein/peptlde fragments from membrane 

Each membrane was transferee! to lOOul of 50% formic acid/50% 
ethanol solution and incubated at room temperature for 30 minutes 
to remove the protein/peptides . The membranes were then removed 
and 300ul of water added to the 50% formic acid/50% ethanol 
solution containing the protein/peptides. 

Analysis 

The following were analysed by reversed phase HPLC 

40ug of trypsin in PBS/ 40ug of lysozyme in PBS; 40ug of lysozyme 
digested with trypsin; 40ug of trypsin digested with trypsin; 
membrane bound modified lysozyme digested with trypsin; membrane 
put through the modification protocol without lysozyme and 
digested with trypsin; membrane bound lysozyme unmodified and 
digested with trypsin; membrane bound lysozyme modified without 
trypsin digestion. 

Results 

We have now performed the operation using PVDF membranes in lieu 
of Zitex membranes for the sequencing reaction (I) . We have 
foimd that the reversed phase HPLC chromatogram for lysozyme 
(used as a typical protein) obtained after treatment with the 
sequencing reactions on a PVDF membrane and trypsinisation, from 
which the chromatogram for the same process in the aQDsence of 
lysozyme has been subtracted, is similar to that obtained for 
lysozyme trypsinised directly. Hence the technologies are 
compatible and can be used to generate 'signature' peptides for 
MS-MS identification (data not shown) . 
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KEY TO THE DRAWINGS: 
FIGURE 1 

Step 1 : . Extract proteins with harsh solvents and capture 

extracted proteins onto a PVDF membrane 
Step 2: Loosely immobilised proteins can be washed to dispose 

of harsh solvents 
Step 3: Treat proteins with C-terminal coupling agent 
Step 4: Treat derivitised proteins with cyclisation reagent 

and then cleave terminal amino acid from derivitised 

protein 

Step -5 : Biotinylate newly exposed penultimate amino acid 
carboxyl group 

Step 6 : Wash membrane bound proteins to remove chemical agents 
and cleave proteins with trypsin in physiological 
buffer 

Step 7: Capture terminal fragments onto avidinated beads 
Step 8: Wash away free peptides then release captured peptide 

'tags' for analysis 
Step 9: Analyse by MS or LC/MS/MS or MS/MS 



FIGURE 2 
Step 1: 

Step 2: 

Step 3 : 
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Step 5 
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Step 7: 



Extract proteins with harsh solvents and capture 
extracted proteins onto a PVDF membrane 
Loosely imtnobilised proteins can be washed to dispose 
of harsh solvents 

Treat proteins with C-terminal coupling agent 

Wash membrane bound proteins to remove chemical agents 

and cleave proteins with trypsin or other sequence 

specific cleavage agent in in physiological buffer 

Biotinylate newly exposed carboxyl termini 

Capture terminal fragments onto avidinated beads in an 

affinity column for example 

Analyse eluted C-terminal by MS or LC/MS/MS or MS/MS 
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FIGURE 3 

Step 1: Extract proteins with harsh solvents and capture 

extracted proteins onto a PVDF membrane 
Step 2: Loosely immobilised proteins can be washed to dispose 

of harsh solvents 
Step 3 : Treat proteins with C- terminal coupling agent carrying 

immobilisation effector 
Step 4 : Wash membrane bound proteins to remove chemical agents 

and cleave proteins with trypsin in physiological 

buffer 

Step -5 : Capture terminal fragments onto avidinated beads 
Step 6: Wash away free peptides then release captured peptide 

'tags' for analysis 
Step 7: Analyse by MS or LC/MS/MS or MS/MS 



FIGURE 4 

Step 1 2 Extract proteins with harsh solvents and capture 

extracted proteins onto a PVDF membrane 
Step 2: Loosely immobilised proteins can be washed to dispose 

of harsh solvents 
Step 3 : Treat proteins with C- terminal coupling agent 
Step 4: Treat coupled proteins with derivitisation reagent to 

ensure all' esqsosed carboxyls are capped 
Step 5: Treat derivitised proteins with cyclisation reagent 

and then cleave terminal amino acid from derivitised 

protein 

Step 6: Biotinylate newly exposed penultimate amino acid 
carboxyl group 

Step 7 : Wash membrane bound proteins to remove chemical agents 
and cleave proteins with trypsin in physiological 
buffer 

Step 8 : Capture terminal fragments onto avidinated beads 
Step 9: Wash away free peptides then release captured peptide 

'tags' for analysis 
Step 10: Analyse by MS or LC/MS/MS or MS/MS 



wo 98/32876 PCT/GB98/00201 

- 33 - 

CLAIMS: 

1. A method for characterising polypeptides, which comprises: 

(a) treating a sample comprising a population of one or more 
polypeptides with a cleavage agent which is known to recognise 
in polypeptide chains a specific amino acid residue or sequence 
and to cleave at a cleavage site, whereby the population is 
cleaved to generate peptide fragments; 

(b) isolating a population of the peptide fragments which bear 
at one end a reference terminus comprising either only a C- 
terminus or only an N- terminus and which bear at the other end 
the cleavage site proximal to the reference terminus; and 

(c) determining a signature sequence of at least some of the 
isolated fragments, which signature sequence is the sequence of 
a predetermined number of amino acid residues running from the 
cleavage site; 

wherein the signature sequence and the relative position of the 
cleavage site to the reference terminus characterise the or each 
polypeptide . 

2. A method according to claim 1, wherein the reference 
terminus is attached to a solid phase support to immobilise the 
population of polypeptides or peptide fragments thereof- 

3. A method according to claim 2, wherein the population of 
polypeptides is immobilised before treatment with the cleavage 
agent • 

4. A method according to claim 2 or claim 3, wherein the 
reference terminus is attached to the solid phase support by: 

(i) treating the polypeptides with a blocking agent to block 
all exposed reference groups, which comprise either carboxyl 
groups or primary amine groups; 

(ii) cleaving the reference terminal amino acids to expose 
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unblocked reference termini; and 

(iii) treating the xinblocked reference termini with an 
immobilisation agent capable of coupling to the solid phase 
support; wherein step (b) comprises binding the treated reference 
termini to the solid phase support and removing ixnbound peptide 
fragments.. 

5- A method according to claim 1, which further comprises 

(i) preparing the sample step (a) by pre-treating the 
polypeptides with a blocking agent to block all exposed reference 
groups, which comprise either carboxyl groups or primary amine 
groups, so that subsequent treatment of the sample with the 
cleavage agent generates peptide fragments bearing unblocked 
reference termini; 

(ii) treating the unblocked reference termini with an 
immobilisation agent capable of coupling to a solid phase 
support ; and 

(iii) binding the peptide fragments containing the unblocked 
reference termini to the solid phase support; wherein step (b) 
comprises eluting unboxind peptide fragments therefrom. 

6. A method according to claim 4 or claim 5/ wherein the 
immobilisation agent comprises a biotinylation agent. 

7. A method according to any one of claims 4 to 6, wherein the 
reference group is carboxyl. 

8. A method according to any one of the preceding claims, 
wherein the cleavage agent comprises a peptidase. 

9. A method according to any one of the preceding claims, 
wherein the sample of step (a) comprises a sub- cellular fraction. 

10. A method according to any one of the preceding claims, which 
further comprises preparing the sample of step (a) by liquid 
chromatography. 
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11. A method according to any one of the preceding claims, 
wherein the signature sequence is determined by mass 
spectrometry. 

12. A method according to claim 11, wherein the mass 
spectrometry is preceded by a high pressure liquid chromatography 
step to resolve the peptide fragments. 

13. A method according to any one of claims 1 to 11, wherein the 
peptide fragments are subjected to ion exchange chromatography 
before step (c) . 

14. A method according to any one of the preceding claims, 
wherein the predetermined number of amino acid residues is from 
3 to 30, 

15. A method for identifying polypeptides in a test sample, 
which comprises characterising the polypeptides in accordance 
with a method according to any one of the preceding claims, 
comparing the signature sequences and relative positions of the 
cleavage site obtained thereby with the signature sequences and 
relative positions of the cleavage site of further polypeptides 
in order to identify the or each polypeptide in the test sample. 

16 . A method for assaying for one or more specific polypeptides 
in a test sample, which comprises performing a method according 
to any one of claims 1 to 14, wherein the cleavage agent and 
relative position of the cleavage site is predetermined and the 
signature sequence is determined in step (c) by assaying for a 
predetermined sequence of amino acid residues running from the 
cleavage site. 

17. A method according to claim 16, wherein the cleavage site 
and signature sequence are predetermined by selecting 
corresponding sequences from one or more known target 
polypeptides. 
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